Filtering placement of advertisements

ABSTRACT

This invention concerns optimal ad selection for Web pages by selecting and updating an attribute set, obtaining and updating an ad-attribute profile, and optimally choosing the next ad. The present invention associates a set of attributes with each customer. The attributes reflect the customers&#39; interests and they incorporate the characteristics that impact ad selection. Similarly, the present invention associates with each ad an ad-attribute profile in order to calculate a customer&#39;s estimated ad selection probability and measure the uncertainty in that estimate. An ad selection algorithm optimally selects which ad to show based on the click probability estimates and the uncertainties regarding these estimates.

RELATIONSHIP TO PRIOR APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.60/164,253, titled “Optimal Internet Ad Placement Technology,” filedNov. 8, 1999.

BACKGROUND OF THE INVENTION

This invention relates generally to the allocation (e.g. as in a marketor exchange) of the supply of a class of products/services with thedemand for a class of products/services in an optimal manner (i.e.system-wide best solution since the values of different allocationstrategies may vary significantly) that quantifies and accounts for theuncertainty surrounding the supply and demand of differentproducts/services. More particularly, the present invention comprises asystem and method for the optimal placement of ads on Web pages.

Optimal ad placement has become a critical competitive advantage in theInternet advertising business. Consumers are spending an ever-increasingamount of time online looking for information. The information, providedby Internet content providers, is viewed on a page-by-page basis. Eachpage can contain written and graphical information as well as one ormore ads. Key advantages of the Internet, relative to other informationmedia, are that each page can be customized to fit a customer profileand ads can contain links to other Internet pages. Thus, ads can bedirectly targeted at different customer segments and the ads themselvesare direct connections to well-designed Internet pages. Although thepresent example has been described with respect to traditional Webbrowsing on a Web page, the same principals apply for any content,including information or messages, as well as advertisements, deliveredover any Internet enabled distribution channel, such as via e-mail,wireless devices (including, but not limited to phones, pagers, PDAs,desktop displays, and digital billboards), or other media, such as ATMterminals.

Therefore, as used herein, the term “ad” is also meant to include anycontent, including information or messages, as well as advertisements,such as, but not limited to, Web banners, product offerings, specialnon-commercial or commercial messages, or any other sort of displayed oraudio information.

The terms “Web page,” “Web site,” and “site” are meant to include anysort of information display or presentation over an Internet enableddistribution channel that may have customizable areas (including theentire area) and may be visual, audio, or both. They may be segmentedand or customized by factors such as time and location. The term“Internet browser” is any means that decodes and displays theabove-defined Web pages or sites, whether by software, hardware, orutility, including diverse means not typically considered as a browser,such as games.

The term “Internet” is meant to include all TCP/IP based communicationchannels, without limitation to any particular communication protocol orchannel, including, but not limited to, e-mail, News via NNTP, and theWWW via HTTP and WAP (using, e.g., HTML, DHTML, XHTML, XML, SGML, VRML,ASP, CGI, CSS, SSI, Flash, Java, JavaScript, Perl, Python, Rexx, SMIL,Tcl, VBScript, HDML, WML, WMLScript, etc.).

The term “customer” or “user” refers to any consumer, viewer, or visitorof the above-defined Web pages or sites and can also refer to theaggregation of individual customers into certain groupings. “Clicks” and“click-thru-rate” or “CTR” refers to any sort of definable, trackable,and/or measurable action or response that can occur via the Internet andcan include any desired action or reasonable measure of performanceactivity by the customer, including, but not limited to, mouse clicks,impressions delivered, sales generated, and conversions from visitors tobuyers. Additionally, references to customers “viewing” ads is meant toinclude any presentation, whether visual, aural, or a combinationthereof.

The term “revenue” refers to any meaningful measure of value, including,but not limited to, revenue, profits, expenses, customer lifetime value,and net present value (NPV).

The Internet ad placement technology of the present invention providesan optimal strategic framework for selecting which ad a customer willview next. It maximizes the overall expected ad placement revenue (orany other measure of value), trading off the desire for learning withrevenue generation. The technology can be executed in “real-time” andupdates the strategy space for every customer.

At its core, the problem is to place the right ad to the right customer.Ad placements are compensated based on the number of successfulresponses that they generate. This usually means that compensationoccurs every time a customer responds to (e.g., clicks) an ad. Customersrespond to ads according to their interests and demands. Thus, a keynecessity is to obtain a reliable characteristic profile of eachcustomer. Only with given information about the customer can ads beprovided that are targeted towards each customer. Second, there is aneed to estimate how different customers will react to different ads.That is, a customer-ad response relation is required. Finally, there isa need for an ad placement technology that optimally decides which ad toshow. At the instant a customer opens a page, it is necessary to placean ad. The ad placement technology must incorporate the customer'slikely response to each ad and the financial gains resulting from acustomer's selection of an ad.

A successful ad placement technology must overcome several criticalcomplications. First, the ad placement algorithm must be sufficientlyfast to ensure “real-time” placement. Second, a key element of thetechnology is its ability to learn through continuous updating. Littleinformation is available about new ads. However, as ads are placed, itcan be learned how they relate to various customer profiles. Thus, thetechnology should both be able to learn and trade off learning versusrevenue generation. Finally, the ad placement technology must be able todetect ineffective ads and incorporate minimum and maximum ad placementand ad selection constraints.

BRIEF SUMMARY OF THE INVENTION

This invention concerns optimal ad selection for Internet-delivered ads,such as for Web pages, by selecting and updating an attribute set,obtaining and updating an ad-attribute profile, and optimally choosingthe next ad. The present invention associates a set of attributes witheach customer. The attributes reflect the customers' interests and theyincorporate the characteristics that impact ad selection. Similarly, thepresent invention associates with each ad an ad-attribute profile inorder to calculate a customer's estimated ad selection probability andmeasure the uncertainty in that estimate. An ad selection algorithmoptimally selects which ad to show based on the click probabilityestimates and the uncertainties regarding these estimates.

It is therefore an object of the present invention to integrate theoptimization and scheduling of web-based ad serving.

It is another object of the present invention to provide an optimalstrategic framework for selecting which ad a customer will view next.

It is also an object of the present invention to maximize the overallexpected ad placement revenue (or any other measure of value), tradingoff the desire for learning with revenue generation.

It is another object of the present invention to place ads on Web sitesin such a way as to maximize the overall value for the ad servingentity, whether based on impressions, clicks, conversions, orcombinations thereof.

It is an object of the present invention to provide an ad placementalgorithm that is sufficiently fast to ensure “real-time” ad placement.

It is an object of the present invention to provide an ad placementtechnology that has the ability to learn through continuous updating.

It is another object of the present invention to provide an ad placementtechnology that is able to detect ineffective ads and incorporateminimum and maximum ad placement and ad selection constraints.

It is an object of the present invention to provide an estimate of theprobability a customer will click an ad by estimating a principalcomponent vector as well as the ad's click probabilities.

It is yet another object of the present invention to provide binomialupdating of click probabilities using principal components, as well ascategory restrictions and ad blocking.

It is yet another object of the present invention to provide automaticclustering of Web pages in a manner that effectively improves overallClick-Thru-Rates.

It is another object of the present invention to provide optimaldelivery of content, messages, and/or ads to customers by any Internetenabled distribution channel.

It is a final object of the present invention to optimize ad placementacross a diverse set of media, such as banners, e-mail, and wireless, inan integrated manner via an allocator.

These and other objectives of the present invention will become apparentfrom a review of the detailed description that follows.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates the possible use of the present invention in a priorart direct marketing system.

FIG. 2 illustrates a first embodiment of the present invention for brandname and mass appeal products.

FIG. 3 illustrates a second embodiment of the present invention for lotsand niche products.

FIG. 4 illustrates a schematic of the present invention.

FIG. 5 illustrates the Integrated Channel Management system of thepresent invention.

FIG. 6 illustrates a schematic of the system of the present invention.

FIG. 7 illustrates a schematic of the process of the present invention.

FIG. 8 illustrates a matching of supply and demand for advertising onInternet enabled distribution channels.

DETAILED DESCRIPTION OF THE INVENTION

The present invention comprises a system and method of optimal adplacement. This invention divides the optimal ad selection problem intothree parts: (1) how to select and update the attribute set, (2) how toobtain and update the ad-attribute profile, and (3) how to optimallychoose the next ad. For purposes of this description, the application ofthe present invention will be illustrated with respect to reconcilingthe supply of Web pages with the demand for ads on those Web pages in anoptimal manner that maximizes revenue. It is assumed that each Web pagecan only promote one ad at a time, although that is not a limitation ofthe present invention. Furthermore, the ad provider pays on a per click(ad selection) basis. A typical employment of the invention isillustrated in FIG. 1, wherein customer and client (ad) data 110 isinput, turned into information 120 for modeling and used for ad serving130, as illustrated in FIG. 8.

The present invention associates a set of attributes with each customer.The attributes reflect the customers' interests and they incorporate thecharacteristics that impact ad selection.

Similarly, the present invention associates with each ad an ad-attributeprofile. The ad-attribute profile has two uses, to calculate acustomer's estimated ad selection probability, and to measure theuncertainty in that estimate.

The ad selection algorithm optimally selects which ad to show based onthe click probability estimates and the uncertainties regarding theseestimates. That is, it optimally trades off current revenue potentialwith future revenue potential represented by the uncertainty surroundingthese estimates. Ads that have been frequently placed will have awell-documented current revenue potential while new ads with fewplacements represent the possibility of high future potential.

As customers have long-term interests as well as short-term demands thepresent invention divides attributes into a long-term and a short-termattribute sets. The long-term attribute set measures how much timecustomers spend in different interest categories such as business,sports, and health. The short-term attributes detect when a customer issearching for specific products.

Long-Term Attributes

Long-term customer attributes in the present invention are updated,depending on time and network constraints, on a placement-by-placementor on a time period-by-time period (for example day-by-day) basis. Theattributes measure, for example, how much time on a percentage basis acustomer spends in each interest group (i.e., sports, gardening, etc.).Thus, suppose that the customer chooses sports half the time and financehalf the time. Then sports and finance attributes are each 50% and theremaining attributes are 0%.

Customer interests also change. To accommodate this factor the presentinvention implements either a moving average or anexponentially-weighted approach to updating each customer's long termattributes. Both of these statistical methods put more emphasis onrecent information and can be updated easily.

The attributes together cover all the distinctive characteristics of thecustomers. There are two ways the attributes are structured. The presentinvention has a common set of attributes that are always updated.Alternatively, the present invention has two sets of attributes, a baseset given by easily available data, and a second set of attributes thatare revealed as the customer carries out certain actions.

Short-Term Attributes

The short-term attribute set of the present invention signals every timethere is a specific interest for a particular service or product. Forexample, suppose a customer is currently shopping for a computer. Suchan event can be detected by specifically marking sites that performcomputer comparison tests. The probability that the customer selects acomputer ad will be high.

Ad-Attribute Profiles

Customers also respond differently to different ads. The ad-attributeprofile of the present invention measures how sensitive the ad is to thevarious attributes and thus how likely it is that a customer will reactwhen shown an ad. As the profile for a given customer is not known aheadof time, it must be estimated. This profile estimation algorithmprovides an efficient means for updating the attribute estimates in“real time.” It is not necessary to store the complete history ofcustomers' responses, but only a set of sufficient statistics for eachad. The sufficient statistics are one square matrix variable withdimension equal to the number of attributes, one vector variable withdimension equal to the number of attributes, and two scalars.Furthermore, the sufficient statistics can be quickly calculated.

The profile estimation algorithm also records the uncertainty of eachad-attribute. The uncertainty conceals an ad's effectiveness (asmeasured by the true click probability). As an ad's effectivenessdirectly drives the revenue generation it is important to quickly derivea good estimate. The uncertainty regarding an ad's effectivenessdecreases as the number of times it is shown increases.

Optimal Selection

The ad selector of the present invention places ads in a way thatmaximizes the expected overall long-term ad placement revenue (or anyother measure of value). The ad placement revenue is the compensationreceived every time an ad is clicked. For the moment, suppose that it isknown with certainty the ad-attribute profile for each ad. This meansthat the probabilities that the customer will react to the ads can becalculated. Multiplying the probabilities with the compensations of thecorresponding ads yield the expected ad placement revenues for all ads.The choice that maximizes the expected overall ad placement revenue isthen simply the ad with the highest expected ad placement revenue (orany other measure of value).

Unfortunately, one does not know with certainty the ad attributeprofiles. This means that the above selection algorithm, if employedusing the estimated ad-attribute profile, would not correctly accountfor revenue generation opportunities of those ads that have not beenshown, because it would not incorporate the huge estimation uncertaintyof those ads.

This ad-placement algorithm incorporates the uncertainty as well as theexpected ad revenue in the selection criterion. Conceptually, theuncertainty is a reflection of the ad's potential upside. That is, it ismore likely that the probability of an ad with high uncertainty issignificantly higher than its' estimated value than an ad with lowuncertainty. Only by testing can the present invention determine whetherit is actually true. If true it is clear that there is much to gain inthe future.

The ad-placement selection rule works by for each ad combining thevolatility and the expected value of the ad placement revenue in acertain way. This rule is based on a dynamic programming approach. Thisapproach yields the true optimal selection algorithm among all possiblenon-anticipating selection algorithms. The present invention adapts thedynamic programming solution to obtain a strategy that can be updated inreal-time.

The basic modeling technique of the present invention is outlined belowand illustrated in FIG. 7.

Basic Modeling

There are L customers 700 for each of whom the present invention tracksthe value of MA customer attributes 702. Customer attributes 702 may betime-based, geography based, or any other segmentable and tractableattribute. There are N different ads in campaign 704.

The present invention maintains a customer matrix:

Customer ID Attribute 1 Attribute 2 . . . Attribute MA ID_1 A_11 A_12 .. . A_1MA ID_2 A_21 A_22 . . . A_2MA ID_L A_L1 A_L2 . . . A_LMAAnd an ad matrix:

Ad ID Attribute 1 weight . . . Attribute MA weight Ad_1 W_11 . . . W_1MAAd_2 W_21 . . . W_2MA Ad_N W_N1 . . . W_NMA

Approach 1

-   1. The estimated probability of customer x clicking on ad i is given    by

$\sum\limits_{k = 1}^{MA}{({A\_ xk}){({W\_ ik}).}}$

-   2. Every time a customer visits a Web site within the network, the    data is collected 712 and the attributes of that customer are    updated 714.-   3. Every time a customer is shown an ad, the attribute weightings    for that ad are updated 716 depending on how the customer responded.

The calculation of which ad to show 710 is then clearly quick to computeas it is essentially (MA)(N) multiplications and additions and then acomparison of the determined probabilities 708. With some carefulthought, the updates of the customer and ad matrices can also be donerapidly and with numerical stability.

As the present invention collects more data, this method continues torefine the estimates and thus is referred to as Bayesian. Ads may losetheir effectiveness over time, and people's attributes will certainlyevolve over time. To capture this there are several updating methodsthat weight recent data more heavily. All of these methods can beupdated quickly and require little storage.

In use, as shown in FIGS. 2 and 3, a customer accesses a participatingWeb site at illustrated 201, 301, an ad server determines the best ad toplace (highest score of 150) at 202, 302, the ad is served to the Website at 203, 303 and a click by the customer directs him to theadvertisers Web site at 204, 304.

Adding Uncertainty and Optimizing for Earning vs. Learning

Intuitively, there is a big difference between an ad that has been shown100 times and been selected once and an ad that has been shown 10,000times and been selected 100 times, even though each has been selected 1%of the times it has been shown. It is somehow worth something to us tolearn more about the first ad, as it is quite possible that it will turnout to be a very popular ad.

The present invention alters the above structure by carrying not justthe mean but the standard deviation of each estimated random variable aswell.

The ad selection process then works by combining the estimatedprobability and the standard deviation in a certain way for each ad andthen comparing. When done properly, this is the optimal way to balanceearning and learning.

Updates of the standard deviation can be calculated quickly as they canbe based on the updates of the estimated probabilities.

Adding Structure to the Matrices

The present invention is also able to learn more about a given customerfrom other customers than the above is yet capturing. As a simpleexample, imagine that one has discovered that a particular ad is verypopular with males and this system is considering showing it to aparticular customer. The present invention has an attribute for gender,but doesn't yet know if this particular customer is male or female.However, there is lots of other data about the customer, such asinterest level in sports. By looking at the attributes of all othercustomers, and the associated correlations, the present invention canestimate the probability that this customer is male. The presentinvention may find, for instance, that interest in sports is highlyindicative of being male.

Choosing the Attributes

A key aspect of the present invention is identifying attributes that arepredictive of behavior. This step requires analyzing real data, andshould be re-visited periodically. Second, for numerical stability, thepresent invention must choose attributes that are not too similar to oneanother. There are several ways to choose a representative attributeset, basically by selecting orthogonal attributes. Third, the presentinvention needs concrete policies for deleting non-helpful attributesand splitting ones that are particularly useful. Finally, there areseveral statistical/data-analysis methods the present invention canemploy to create updating procedures for the values of each attribute.The right procedure will depend on initial statistical tests and is alsoa step that should be re-visited at a later stage.

As customers have long-term interests as well as short-term demands thepresent invention divides attributes into a long-term and a short-termattribute sets. The long-term attribute set measures how much timecustomers spend in different interest categories such as business,sports, and health. Thus, suppose that the customer chooses sports halfthe time and finance half the time. Then sports and finance attributesare each 50% and the remaining attributes are 0%.

The short-term attributes detect when a customer is searching forspecific products. For example, a customer shopping for a new computerwill likely visit sites that relate to computer sales. Such sites can bemarked and computer ads placed on such sites have high probabilities ofbeing selected, while general interest ads have markedly lowerprobability of being selected.

Searching among the short-term attributes, for ads to show, will bequick as they only flag high probability events.

Advanced Modeling with Integrated Optimization and Scheduling

Every Web site used with the present invention sends a request for an adevery time a user accesses the site. The request is sent to the admanager. The ad manager has a lookup table specifying ads and associatedprobabilities defining the ads that should be shown next for every site.This lookup table is updated frequently, such as every hour or on anyother relevant time unit basis.

The system records that the ad has been shown and whether or not therewas a click. The system holds a database with the number of impressionsand clicks for each ad on each site by hour. The system also maintains alist of the total and remaining paid clicks for each ad, and a list ofpayments per click for each ad.

Basics

The goal of the optimizer-scheduler is to place ads on Web sites in sucha way as to maximize the overall value for the advertising servingentity. This value may be a combination of impression, clicks,conversions, and other value that may be obtained by placing an ad on aparticular site. The probability of a given ad being clicked on variesfrom site to site. The present invention does not know theseprobabilities beforehand but, rather, the present invention continuouslyrefines this estimate as more observations are made. There is value inobtaining additional information about these probabilities and this isaccounted for in the algorithm.

Arrangements with Web sites tend to be fairly long-term. Arrangementswith advertisers tend to be composed of campaigns, each lasting fromdays to weeks. The advertisers typically purchase a certain number ofclicks. While not always spelled out explicitly, the understanding isthat these clicks will occur reasonably uniformly over the campaign'slifetime. Of course, there is no way to guarantee that an ad does notfall behind schedule (it is possible that nobody chooses to click on thead). The present invention can, however, ensure (assuming that there isa reasonably rich set of ads) that no ad gets significantly ahead ofschedule. This is captured via a tunable parameter within the algorithm.

Occasionally, the arrangement with the advertiser is simply to show thead a specified number of times. The system of the present inventionserves the requested ad according to attributes described above whilesimultaneously tracking the number of times the ad is displayed.

While taking the full lifetime of each campaign into account, thealgorithm explicitly plans for the next 24 hours or other suchreasonable period, and then re-optimizes more frequently, such as everyhour.

DEFINITIONS System Variables

m denotes the number of Web sites or any reasonable partition of the Websites in the network.n denotes the number of ad campaigns or any reasonable collection of adscurrently underway.K denotes the set of ads that are on a pay-per-click basis or any othersimilar measure of performance.M denotes the set of ads that are on a pay-per-view basis or any otherreasonable measure of activity that is not performance related.d_(j) denotes the estimated number of impressions for a first period,such as one 24-hour period or other reasonable period, at site j.μ_(j) denotes the average clicking probability at site j calculated overa second, longer period, such as the past 30 days or other suchreasonable period. Only incorporating the observed probabilities for adsthat have at least, for example, 500 impressions at that site, then onepossible embodiment would be to set μ_(j)=0.005 if site j is new. Else

$\mu_{j} = {\max\left( {{\underset{i;{n_{i,j} > 500}}{Average}\left( p_{i,j} \right)},0.001} \right)}$

In this example, the use of 30 days, 500 impressions, and the tolerancesof 0.005 and 0.001 are merely exemplary and are not meant as alimitation on the average clicking probability μ_(j). Other timelinesand constants could also be used without departing from the scope of theinvention.

Campaign Variables

T_(i) denotes the total duration in days of ad campaign i.t_(i) denotes the time in days since the ad campaign of ad i began.C_(i) denotes the maximum total number of paid clicks for ad i over theduration of the ad campaign.c_(i) denotes the maximum number of remaining paid clicks for ad i.Π_(i) denotes the total minimum number of impressions required by ad iover the duration of its campaign.I_(i) denotes the minimum number of remaining impressions required forad i. I_(i) is updated frequently, such as every hour on the hour.s_(i) denotes the payment per click, per view, per conversion, or perany other reasonable measure of activity or performance, depending onthe arrangement for ad i.n_(i,j) is 2 plus the number of impressions for ad i at site j over thelast 30 days or other such reasonable period. If the ad has never beenshown at site j then n_(i,j)=2. (The present invention adds 2 to avoidproblems associated with n_(i,j)=0)k_(i,j) is the number of clicks for ad i at site j over the duration ofad i's ad campaign.p_(i,j) is the observed clicking probability of ad i at site j. If ad ihas never been shown (n_(i,j)=2) on site j then p_(i,j)=μ_(j).Otherwise,

$p_{i,j} = {\frac{k_{i,j}}{n_{i,j}} + {\mu_{j}{\frac{2}{n_{i,j}}.}}}$

The second term here is to ensure that the present invention never hasp_(i,j)=0.δ_(i) controls the smoothness of the campaign. This can depend on thesmoothness type, how the campaign is doing in terms of delivery, andother factors. A typical value is 0.2. This controls how smoothly clicksmust occur throughout the lifetime of a campaign. A value of 0.2 meansthat no campaign can ever be more than 20% ahead of absolutely smooth(measured daily) delivery.

Parameters

Set γ=1.5 or any other reasonable number. This is the learningparameter, it controls how heavily the present invention emphasizeslearning about ad-site combinations for which the present invention haslittle information. This will be tuned via simulation.α_(i,j) denotes the fraction of times ad i should be shown on site j forthe next period, such as per hour.

Hourly or Frequent Events

The system sends the number of impressions and the number of clicks foreach ad at each site to the ad manager.

The ad manager updates n_(i,j), k_(i,j), and t_(i).The ad manager calculates p_(i,j).Updating of c_(i) and I_(i)

These variables are used in the optimization/scheduling algorithm.First, consider c_(i). The contract for most ads specifies the beginningand end of the ad campaign and the maximum number of paid clicks. Thescheduling algorithm requires a number that is to be used for one day.

In the formula below, the present invention computes the value of c_(i)that corresponds to a perfectly smooth delivery of clicks from thecurrent time on. Note that in the linear program (LP), the presentinvention will not require that this be hit exactly, but rather within apre-set tolerance.

$c_{i} = \frac{\max\left( {\left( {C_{i} - {\sum\limits_{j = 1}^{m}k_{i,j}}} \right),0} \right)}{\max \left( {\left( {T_{i} - t_{i}} \right),\frac{1}{24}} \right)}$

Now, consider I_(i). Sometimes, it is agreed that ad i must obtain aminimum number of impressions. This minimum number must be satisfied atthe end of the campaign. As above, the formula above determines thenumber of impressions needed during the next day to achieve a smoothdelivery of, in this case, impressions.

$I_{i} = \frac{\max\left( {\left( {{\prod\limits_{i}\; {- {\sum\limits_{j = 1}^{m}n_{i,j}}}} + {2*m}} \right),0} \right)}{\max \left( {\left( {T_{i} - t_{i}} \right),\frac{1}{24}} \right)}$

Note that the present invention needs the term 2*m to compensate for thefact the present invention has adjusted n_(ij).

Scheduling Problem (Solved Frequently, Such as Once Every Hour on theHour) Step 1. Define:

${\hat{p}}_{i,j} = {p_{i,j} + {\gamma \sqrt{\frac{p_{i,j}\left( {1 - p_{i,j}} \right)}{n_{i,j} - 1}}}}$

Step 2. Solve the Following Linear Programming Problem:

$\begin{matrix}{\underset{\{\alpha_{i,j}\}}{MAX}{\sum\limits_{i \in K}{\overset{m}{\sum\limits_{j = 1}}{\alpha_{i,j}v_{i,j}d_{j}}}}} & (1)\end{matrix}$

Subject to

$\begin{matrix}{{{\overset{m}{\sum\limits_{j = 1}}{\alpha_{i,j}p_{i,j}d_{j}}} \leq {\left( {1 + \delta_{i}} \right)c_{i}}},{i \in K}} & (2) \\{{{\overset{m}{\sum\limits_{j = 1}}{\alpha_{i,j}d_{j}}} \leq {\left( {1 + \delta_{i}} \right)I_{i}}},{i \in M}} & (3) \\{{{\overset{n}{\sum\limits_{i = 1}}\alpha_{i,j}} \leq 1},{j = 1},2,\ldots \mspace{20mu},m} & (4) \\{{\alpha_{i,j} \geq 0},{i = 1},2,\ldots \mspace{14mu},n,{j = 1},2,\ldots \mspace{14mu},m} & (5)\end{matrix}$

where v_(i,j)={circumflex over (p)}_(i,j)s_(i) if ad i is click-based orconversion-based, and s_(i) if it is impression-based.

Comments

-   (1) The objective function is to maximize the overall value,    including learning about sites where we have little information.-   (2) The LHS is the total number of expected clicks for ad i during    the interval. This constraint enforces the campaign smoothness    condition.-   (3) The LHS is the total number of expected impressions for ad i    during the interval. This constraint enforces the campaign    smoothness condition.-   (4) This constraint ensures that the probabilities of what ads to    show at each site add to 100%.-   (5) This constraint ensures that all probabilities are non-negative.

Remarks

-   (1) By setting s_(i)=1 for all i converts the objective function    into one that seeks to maximize the overall Click-Thru-Rate (CTR).-   (2) There is no explicit constraint ensuring that each ad does not    fall “too far behind”. The reason for this is such a constraint    would lead to the linear program (LP) having no feasible solution.-   (3) To account for the remark above, campaigns should be monitored    on a frequent basis (daily) with poor ads being removed or    outsourced.-   (4) Note that there is obviously always a solution to the LP.

Creating an Ad Lookup Table

The present invention describes the process of converting the output ofthe linear program (LP) into a lookup table. For each site j and ad imultiply the α_(i,j) by 100 and round off the product to the nearestinteger. Let β_(i,j)=Round(100*α_(i,j)). β_(i,j) represents how manytimes out of a hundred ad i should be shown at site j. Create a list forsite j by letting the first β_(i,j) elements be ad 1, let the nextβ_(2,j) be ad 2, and so forth.

This process will yield a list of approximately 100 ads for each site(many ads will appear several times for a given Web site). The next stepis to ensure that the list has exactly 100 ads for each site. This isdone by truncating the list for any site with more than 100, andrepeating the first ad on the list as many times as necessary for anysite with less than 100.It is possible to employ a frequency-capping component at this stage ofthe algorithm.

Daily Routine

Calculate d_(j) and μ_(j) over the last 30 days or other such reasonableperiod, as shown in the schematic diagram of FIG. 4. When new sites ornew ads 410 are added, constraints are prepared 420, and the newmatrices are added to the ad server's optimization engine 430. Prior tohaving adequate data, initial estimates (alphas) 435 are used and thedata is added to the ad look-up tables 440. The ads are then served at450 (with testing 490 and frequency capping 492). Response data iscollected at 460 and recorded together with the ad serving informationin transaction log 470. The data is then used to update parameters at480, and the iterative process continues.

Enhancements

This framework allows for a number of additional constraints to be addedin a natural way.

Click Probability Estimation with Principal Components

Above, the probability that users visiting Web site j will click on ad iwas estimated by dividing the number of clicks on ad i at Web site jwith the number of impressions of ad i at Web site j, but can beestimated by any other reasonable method.

An alternative is a principal component approach to banner adprobability estimation. This approach contains two steps. In the firststep we estimate the principal component vectors whereas in the secondstep we estimate the banner ads' click probabilities. Each step areupdated as new information becomes available.

The advantage to using the principal component approach is significant.For example, if there are 100 Web sites and 5 principal components thenthe conventional approach requires approximately 20 times as manyimpressions as the principal component approach to reach the same levelof accuracy.

This approach is begun by presenting a series of definitions. Itcontinues by describing the principal component estimation, andconcludes by finally describing the probability estimation.

DEFINITIONS

-   Probabilities Estimate of the probability that users downloading ad    i from Web site j will click on that ad is p_(i,j).-   Error Uncertainty of the estimate p_(i,j) is    σ_(i,j)=p_(i,j)*(1−p_(i,j))/n, (a slightly biased estimate),-   Sites There are m sites.-   Site Average Let μ_(j) denote the average click probability on site    j.-   Normalized Ad probability Vector—For each ad i we define the vector    y_(i)=[y_(i,1), y_(i,2), . . . , y_(i,m)] where

$y_{i,j} = {\frac{\left( {p_{i,j} - 1} \right)}{\sigma_{i,j}}.}$

-   Principal Components—hypothesize that there exist l m-dimensional    vectors x₁, x₂, . . . , x_(l), such that every ad probability vector    is a linear combination of x₁, x₂, . . . , x_(l).-   Other Let n_(i,j) denote the number of impressions of ad i on Web    site j and let k_(i,j) denote the number of clicks of ad i on Web    site j.

Principal Components Estimation

When using principal components estimation, the present inventionidentifies ads that have been shown a large number of times at many Websites. These are the ads that will be used to calculate the principalcomponents.

Step 1. Calculate estimation of site averages.

$\mu_{j} = \frac{\sum\limits_{i}p_{i,j}}{{Count}\left( {i\mspace{14mu} {on}\mspace{14mu} j} \right)}$

Step 2. Calculate the variance of the error of each probabilityestimate.

σ_(i,j) =p _(i,j)*(1−p _(i,j))/n

Step 3. Calculate normalized ad probability vectors.Step 4. Calculate the principal components by first creating the matrixY. Row i of Y corresponds to ad i. Then calculate the matrix productY^(T)Y. Then find the eigenvectors and eigenvalues of Y^(T) Y. Choosethe k eigenvectors corresponding to the k eigenvalues which togetheraccounts for at least x % of the total of the sum of all eigenvalues.The first principal component corresponds to the first eigenvector asfollows: Element i of the eigenvector is the weight associated with adi. Therefore, multiply the elements of the first eigenvector with theircorresponding estimated probabilities for each site and sum over thesenewly found values to determine the first principal component vector.Repeat the procedure for the remaining k−1 eigenvectors.

Banner Ad Click Probability Estimation

With the principal components available there are a variety of ways toestimate an ad's click probabilities. Two straightforward methods ofsuch estimation are ordinary least squares regression and generalizedleast squares regression.

The objective of the principal component approach is to efficiently andquickly obtain ad probabilities for a majority of banners. In additionto finding the probabilities for the majority it is also necessary toidentify banners where the principal components do not capture asignificant portion of the observed probabilities. A maximum likelihoodapproach can be used to integrate this aspect into the probabilityestimation routine.

Binomial Updating of Click Probabilities Using Principal Components

Consider a row of n cells that have unknown click probabilities p_(i),where cells are i=1, 2, . . . , n

Assume there is a single (for notational simplicity) principal componentthat is likely to give these probabilities. This principal component isa vector v=(v₁, v₂, . . . , v_(n))≧0. Then model the vector P as

P=av+e

where a is an unknown constant and e=(e₁, e₂, . . . , e_(n)) is a vectorof errors.

Then assume that the e_(i)'s are independent, normal random variableswith zero mean and variance σ². The variance is determined by theprocess that determines the principal components.

Now, imagine the system has been run for a while and has observed k_(i)clicks from n_(i) impressions in cell i. It is then desirable to assignthe best p_(i)'s.

The joint probability of those click rates and the probabilities given ais

$P = {\prod\limits_{i = 1}^{n}\; {{\exp \left\lbrack {\frac{- 1}{2\sigma^{2}}\left( {p_{i} - {a\; v_{i}}} \right)^{2}} \right\rbrack}{\prod\limits_{i = 1}^{n}\; {{p_{i}^{k_{i}}\left( {1 - p_{i}} \right)}^{n_{i} - k_{i}}C}}}}$

where C is a constant independent of a and the p_(i)'s.

Now determine a and the p_(i)'s by maximizing P with respect to a andthe p_(i)'s. Ignoring C, to obtain:

$\begin{matrix}{{\ln \; P} = {{\frac{- 1}{2\sigma^{2}}{\sum\limits_{i = 1}^{n}\left( {p_{i} - {a\; v_{i}}} \right)^{2}}} + {\sum\limits_{i = 1}^{n}\left\lbrack {{k_{i}\; \ln \; p_{i}} + {\left( {n_{i} - k_{i}} \right){\ln \left( {1 - p_{i}} \right)}}} \right\rbrack}}} & \left. {(*} \right)\end{matrix}$

Note that ln P is concave with respect to a and p_(i)'s≧0, somaximization is well-defined.Note that (as one would expect) if σ>>0 and/or n_(i), k_(i) large, onefinds p_(i)=k_(i)/n_(i). Also, for σ small and/or n_(i), k_(i) small,one finds p_(i)=av_(i).

Now, the problem is separable with respect to p_(i)'s, so one strategyis to maximize with respect to p_(i) with a fixed. This gives thenecessary condition:

${F\left( p_{i} \right)} = {{{\frac{- 1}{\sigma^{2}}\left( {p_{i} - {av}_{i}} \right)} + \frac{k_{i}}{p_{i}} - \frac{\left( {n_{i} - k_{i}} \right)}{\left( {1 - p_{i}} \right)^{2}}} = 0}$

Note that F(0)=+∞ and that F(1)=−∞. Hence, there is a p, with 0<p_(i)<1and F(p_(i))=0.

Furthermore,

${F^{\prime}\left( p_{i} \right)} = {{\frac{- 1}{\sigma^{2}} - \frac{k_{i}}{p_{i}^{2}} - \frac{\left( {n_{i} - k_{i}} \right)}{\left( {1 - p_{i}} \right)^{2}}} < 0}$

so F is monotone. Thus, the solution is unique.

It can therefore be concluded that for a given a, there is for each i=1,2, . . . , n a unique p_(i), 0<p_(i)<1, that can be easily found byNewton's method or any other descent method. (The case of k_(i)=0 ishandled separately later.)

Now, consider p_(i) to be a function of a. Then,

$\begin{matrix}{{\frac{\partial}{\partial a}\ln \; P} = {\frac{{\partial\ln}\; P}{\partial a} + {\sum\limits_{i = 1}^{n}\; {\frac{{\partial\ln}\; P}{\partial p_{i}}P_{0}^{\prime}}}}} \\{= \frac{{\partial\ln}\; P}{\partial a}} \\{= {\frac{1}{\sigma^{2}}{\sum\limits_{i = 1}^{n}\; {\left( {p_{i}^{(a)} - {av}_{i}} \right)v_{i}}}}}\end{matrix}$

This discussion motivates the following algorithm:

1. Select initial a

2. Find the p_(i)'s by solving F_(i) (p_(i),a)=0 (Newton's method 1variable at a time)

3. Evaluate

$\frac{\partial}{\partial a}$

ln P

4. Adjust a by steepest descent

Note that the extension to multiple principal components isstraightforward.Case of k_(i)=0The necessary condition is

(1−p _(i))(av ₁ −p _(i))=n _(i)σ²

It is easy to see that if av_(i)>n_(i)σ², then there is a solution with0<p_(i)<1. Otherwisep_(i)=0. should be used. Putting this together,p_(i)=max{root₁,0} where root₁ is the root of the quadratic less than 1.That is,

${root}_{1} = \frac{1 + {av}_{i} - \sqrt{\left( {1 + {av}_{i}} \right)^{2} - {4\left( {{av}_{i} - {n_{i}\sigma^{2}}} \right)}}}{2}$

Note that it follows from this that if n_(i)=0, we have p_(i)=av_(i). Ifk_(i)=0 repeatedly, one does not set p_(i)=0 until they get at least

$n_{i} = \frac{{av}_{i}}{\sigma^{2}}$

impressions.

Initial Value of a

If all the n_(i)'s are small, and/or σ² is small, we set p_(i)=av_(i)for all i.

Then,

${{Ln}\; P} = {{\sum\limits_{i = 1}^{n}\; {k_{i}\ln \; {av}_{i}}} + {\sum\limits_{i = 1}^{n}\; {\left( {n_{i} - k_{i}} \right){\ln \left( {1 - {av}_{i}} \right)}}}}$$\frac{{\partial\ln}\; P}{\partial a} = {{{\sum\limits_{i = 1}^{n}\; \frac{k_{i}}{a}} - {\sum\limits_{i = 1}^{n}\; {\frac{\left( {n_{i} - k_{i}} \right)}{1 - {av}_{i}}v_{i}}}} = 0}$

Solve for a.

This can be interpreted by multiplying by a.

${\sum\limits_{i = 1}^{n}\; k_{i}} = {\sum\limits_{i = 1}^{n}\; {\left( {n_{i} - k_{i}} \right)\frac{{av}_{i}}{1 - {av}_{i}}}}$

which shows that a is set to balance the overall probabilitiesconsistent with observed clicks and impressions.

Prior Distribution on a

Adding a prior density on a as

$\frac{1}{\sqrt{2\pi}\omega}\exp \left\{ {{- \frac{1}{2\omega^{2}}}\left( {a - a_{0}} \right)^{2}} \right\}$

This adds the term

${- \frac{1}{2\omega^{2}}}\left( {a - a_{0}} \right)^{2}$

to ln P as defined in (*) above.

Category Restrictions

Certain advertisers would like to have their ads displayed only on asubset of the sites. This is handled in the following way. Let thesubset of such sites be denoted by J. This might be, for example, theset of all sports related sites. Then, if the present invention isconsidering ad i, the restriction takes the form:

α_(i,j)=0 for all j∉J.

The subset J can, of course, involve multiple levels of categories,generally chosen by the advertiser. A typical subset could be somethinglike ‘all of the sports related—Spanish language—G-rated sites.’

Ad Blocking

Conversely, certain Web sites would like to prevent particular ads fromappearing on their site. This may be the case, for instance, if the itembeing advertised is viewed as a competitor to the Web site's product.Let the site be denoted by j and the set of ads to be blocked to bedenoted by the set I. Then the restriction has the form

α_(i,j)=0 for all i∈I.

Typically, a Web site would be able to do this by both blocking entirecategories, such as R-rated sites, and by selecting particular ads forexclusion, such as one of a direct competitor.

Click-Thru-Rate (CTR) of Impression Based Ads

Even with contracts that are strictly impression based, it may beadvantageous to attempt to enhance the CTR of such ads. Providing a goodCTR may lead to more future business. To do this, the present inventionmust determine how valuable each click on an impression based ad is ineconomic terms. Then, this can simply be added to the objectivefunction.

Clustering Process

Automatic clustering of small Web sites can be employed in a manner thateffectively improves overall Click-Thru-Rates. To form clusters, theprocess starts by matching each ad with a campaign type, which isassigned through a GUI. There are types for ‘Personal Finance’,‘Sports’, ‘Computers and Technology’, and the like. The presentinvention denotes each campaign type t_(i), i=1, 2, . . . , 20, and theset of all campaign types T. Each cluster will correspond to one ofthese types.

To determine which types will be used for clustering, a database is usedwith the history of the last 30 days or other reasonable period, andcount all the impressions for each type. If the objective is to form nclusters, then the first n types ordered by descending number ofimpressions are selected to be the clustering types. Now call eachclustering type {circumflex over (t)}_(j), j=1, 2, . . . , n, and theset of all the clustering types {circumflex over (T)}. Each clusteringtype is assigned a number (ID) starting from 2 and going up until n+1. AWebmaster with cluster ID=0 means that it was not clustered, and withID=1 means it is in a cluster of special Webmasters.

The database contains information on all the campaign types that eachWebmaster showed. Not all webmasters-type pairs in the database will beused to perform the computations; in one embodiment, only those thatmeet the following requirements:

-   -   It must have more than 2 impressions on a type    -   It must have more than 1 click on a type    -   The CTR for a type must be less than 100%        Although this is a preferred screening process, any other such        reasonable screening process can be used without departing from        the scope of the present invention.

In addition, the set of campaign types for a Webmaster must be asuperset of the clustering types: {circumflex over (T)}⊂T_(m), where mrepresents a particular Webmaster.

Each Webmaster will be assigned to one and only one cluster, so it willhave a corresponding cluster ID, ID_(m). Only one more piece ofinformation is needed to determine the cluster ID of each Webmaster:p-hat.

${{p\_ hat}_{m,{\hat{t}}_{i}} = {{CTR}_{m,i} + {\gamma \sqrt{\frac{{CTR}_{m,i}\left( {1 - {CTR}_{m,i}} \right)}{{imps}_{m,i}}}}}},$

where γ is a learning parameter m is the Webmaster, i is the campaigntype, and imps_(m,i) refers to the number of impressions for theWebmaster-campaign type pair. Now,

${{ID}_{m} = {1 + {\arg {\max\limits_{j}\left( {p\_ hat}_{m,{\hat{t}}_{j}} \right)}}}},{j = 1},2,\ldots \mspace{14mu},{n.}$

Each j corresponds to a clustering type, as defined before.

Thus, the object is to look for the max p-hat for each Webmaster. Thetype associated with the max p-hat will be cluster assigned to theWebmaster. In order to write the output, the present inventiontranslates the type to its cluster ID.

Splitting Large Clusters

It could be the case that once clusters are formed, the total number ofimpressions for one of them will be over 20% or any other reasonable setpercentage of the total number of impressions for all the clusters. Inthis case, it is desirable to split the cluster by applying theclustering process to those Webmasters in the largest cluster, and byforming a new set of two clustering types for them that excluded thetype associated with the cluster. For instance, if cluster 3 withassociated type ‘Sports’ is the target, then a new clustering type setmight be {‘Entertainment’, ‘Health’}, which will be chosen because theyare the two types with the most and second-most impressions. EachWebmaster will be assigned a new cluster ID using the same “max p-hat”criteria.

The splitting process is repeated until no cluster has more than 20% ofall the impressions.

Integrated Channel Management

It is also desirable to optimize ad placement across a diverse set ofmedia, such as banners, e-mail, and wireless, in an integrated manner.An allocator 500, as shown in FIG. 5, can be used to serve full-sized510, odd-sized 520, and other type 530 ads using the followingalgorithm:

DEFINITIONS

V_(i)=Expected impressions per period, such as per day, of media type i.p_(ij)=probability of a click on media type i for campaign j.G_(j)=Total target number of clicks for campaign j for the period.ζ_(ij)=The percent of all impressions from media i that will beallocated to campaign j.

${Max}{\sum\limits_{i,j}\; {p_{ij}ϛ_{ij}V_{i}}}$${s.t.\mspace{14mu} {\sum\limits_{j}\; ϛ_{ij}}} \leq {1\mspace{14mu} {for}\mspace{14mu} {all}\mspace{14mu} i}$${\sum\limits_{i}\; {p_{ij}ϛ_{ij}V_{i}}} \leq {\left( {1 + \delta} \right)G_{j}\mspace{14mu} {for}\mspace{14mu} {all}\mspace{14mu} j}$ϛ_(ij) ≥ 0  for  all  i  and  j

Of course, constraints enforcing minimum and maximum representation onvarious channels are possible as well.

Then, p_(ij)ζ_(ij)V_(i) is sent to the LP as the upper bound forcampaign j for channel type i.

Multiple Ads from One Customer

From time to time, an advertiser will employ multiple banner designs.One approach to this, of course, is simply to treat each of these as aseparate ad. However, if the advertiser is willing to let the optimizerselect which ads to show, the present invention can expect on average animprovement in the CTR. Imagine that the two ads are labeled l and m,and that the initial click totals (on an average daily basis) were c_(l)and c_(m). Then, normally the present invention would have included thetwo constraints:

${\sum\limits_{j = 1}^{m}\; {\alpha_{l,j}p_{l,j}d_{j}}} \leq {\left( {1 + \delta} \right)c_{l}}$${\sum\limits_{j = 1}^{m}\; {\alpha_{m,j}p_{m,j}d_{j}}} \leq {\left( {1 + \delta} \right)c_{m}}$

Instead, the present invention can replace this with the singleconstraint, which is less restrictive and therefore will result in abetter or equal solution:

${\sum\limits_{j = 1}^{m}\; \left( {{\alpha_{l,j}p_{l,j}d_{j}} + {\alpha_{m,j}p_{m,j}d_{j}}} \right)} \leq {\left( {1 + \delta} \right)\left( {c_{l} + c_{m}} \right)}$

It is also possible to do something in between the above two solutions.For example, an advertiser with two different ad designs could ask for atotal of 10,000 clicks with a minimum of 2,500 each. Therefore, thereare many other reasonable solutions.

The method of the present invention can be practiced by conventionalservers 620, 630, such as Pentium III based systems operating withWindows NT, interacting over the Internet 600 to collect attributeinformation about customers 640 and ads from database 610, and thenserve the ads to the customers 640 operating Internet enabled deviceswith browsers, such as Apple Macintosh or Windows-based personalcomputers with browser clients like Internet Explorer or NetscapeNavigator, as shown in FIG. 6. As such, there are no specialrequirements for the user interaction on the Internet using the presentinvention. Conventional PCs, which may be Pentium based or AppleMacintosh type processors, are all suitable processors for exercisingthe present invention. Likewise, the server of the present invention canbe an Intel Pentium type server, Sun server or other server suitable forserving advertisements.

Numerous aspects of the present invention also have separate utilityoutside of any Internet enabled distribution channels. The basicmodeling methodologies and algorithms of the present invention aretherefore able to be incorporated with virtually any other marketingmedium in which an “ad” is displayed to a “customer,” including, but notlimited to, mail, telephone, facsimile, television, radio, and printmedia. Other embodiments, with modifications and changes to thepreferred embodiment, will be apparent to those skilled in the artwithout departing from the scope of the present invention as disclosed.Therefore, the present invention is only limited by the claims appendedhereto.

1. A method comprising: receiving an advertisement request in responseto one or more actions by a user on a marketing medium; determining,using at least one processor, an estimated selection probability for oneor more advertisements of a plurality of advertisements, wherein theestimated selection probability indicates a likelihood that the userwill select a given advertisement; identifying an advertisement with ahigh estimated selection probability; determining if the advertisementwith the high estimated selection probability is on a block listassociated with the marketing medium; if the advertisement with the highestimated selection probability is on the block list, selecting anotheradvertisement to serve to the marketing medium; and if the advertisementwith the high estimated selection probability is not on the block list,serving the advertisement with the high estimated selection probabilityto the marketing medium.
 2. The method as recited in claim 1, whereinthe block list comprises one or more advertisements from a competitor.3. The method as recited in claim 1, wherein the block list comprises acategory of advertisements.
 4. The method as recited in claim 3, whereinthe category comprises advertisements for explicit content.
 5. Themethod as recited in claim 1, further comprising maintaining a userprofile, the user profile including user attributes reflecting interestsof the user.
 6. The method as recited in claim 5, further comprisingupdating the user attributes of the user profile based on one or more ofInternet sites visited by the user, advertisements selected by the user,or internet searching by the user.
 7. The method as recited in claim 6,further comprising maintaining an advertisement profile for the one ormore advertisements of the plurality of advertisements, eachadvertisement profile including ad-attributes that reflect how much anadvertisement correlates to a given user attribute.
 8. The method asrecited in claim 7, wherein the estimated selection probability for aparticular advertisement is a function of the consumer attributes of theuser and the advertisement profile for the particular advertisement. 9.The method as recited in claim 1, wherein the marketing medium comprisesa software program.
 10. The method as recited in claim 9, wherein themarketing medium comprises one or more websites.
 11. The method asrecited in claim 9, wherein the marketing medium is a software programon a mobile device.
 12. The method as recited in claim 1, wherein themarketing medium comprises a mobile device.
 13. The method as recited inclaim 1, wherein the one or more actions comprise accessing themarketing medium.
 14. A non-transitory computer-readable storage mediumincluding a set of instructions that, when executed, cause at least oneprocessor to perform steps comprising: receiving an advertisementrequest in response to one or more actions by a user on a marketingmedium; determining an estimated selection probability for one or moreadvertisements of a plurality of advertisements, wherein the estimatedselection probability indicates a likelihood that the user will select agiven advertisement; identifying an advertisement with a high estimatedselection probability; determining if the advertisement with the highestimated selection probability is on a block list associated with themarketing medium; if the advertisement with the high estimated selectionprobability is on the block list, selecting another advertisement toserve to the marketing medium; and if the advertisement with the highestimated selection probability is not on the block list, serving theadvertisement with the high estimated selection probability to themarketing medium.
 15. The computer-readable storage medium as recited inclaim 14, wherein the block list comprises one or more advertisementsfrom a competitor.
 16. The computer-readable storage medium as recitedin claim 14, wherein the block list comprises a category ofadvertisements.
 17. The computer-readable storage medium as recited inclaim 16, wherein the category comprises advertisements for explicitcontent.
 18. The computer-readable storage medium as recited in claim14, further comprising instructions that, when executed, cause the atleast one processor to maintain a user profile, the user profileincluding user attributes reflecting interests of the user.
 19. Thecomputer-readable storage medium as recited in claim 18, furthercomprising instructions that, when executed, cause the at least oneprocessor to update the user attributes of the user profile based on oneor more of Internet sites visited by the user, advertisements selectedby the user, or internet searching by the user.
 20. Thecomputer-readable storage medium as recited in claim 19, furthercomprising instructions that, when executed, cause the at least oneprocessor to maintain an advertisement profile for the one or moreadvertisements of the plurality of advertisements, each advertisementprofile including ad-attributes that reflect how much an advertisementcorrelates to a given user attribute.
 21. The computer-readable storagemedium as recited in claim 20, wherein the estimated selectionprobability for a particular advertisement is a function of the consumerattributes of the user and the advertisement profile for the particularadvertisement.
 22. The computer-readable storage medium as recited inclaim 14, wherein the marketing medium comprises a software program. 23.The computer-readable storage medium as recited in claim 22, wherein themarketing medium comprises one or more websites.
 24. Thecomputer-readable storage medium as recited in claim 22, wherein themarketing medium is a software program on a mobile device.
 25. Thecomputer-readable storage medium as recited in claim 14, wherein themarketing medium comprises a mobile device.
 26. The computer-readablestorage medium as recited in claim 14, wherein the one or more actionscomprise accessing the marketing medium.
 27. A method comprising:receiving an advertisement request in response to one or more actions ofa user on a marketing medium; determining, using at least one processor,one or more interests of the user; selecting a first advertisementtargeted to the one or more interests; identifying one or morecategories associated with the marketing medium; and if the one or morecategories is a permitted category, serving the first advertisement tothe marketing medium.
 28. The method as recited in claim 27, wherein ifthe one or more categories is not a permitted category, serving anotheradvertisement to the marketing medium.
 29. The method as recited inclaim 27, further comprising receiving, from an advertiser associatedwith the first advertisement, a list of one or more permittedcategories.
 30. The method as recited in claim 28, wherein the one ormore categories comprise multi-levels of categories.
 31. The method asrecited in claim 30, wherein the one or more categories comprise asub-set of advertisements in a first category.
 32. The method as recitedin claim 27, further comprising maintaining a user profile, the userprofile including user attributes reflecting interests of the user. 33.The method as recited in claim 32, further comprising updating the userattributes of the user profile based on one or more of Internet sitesvisited by the user, advertisements selected by the user, or internetsearching by the user.
 34. The method as recited in claim 32, furthercomprising maintaining an advertisement profile for one or moreadvertisements of the plurality of advertisements, each advertisementprofile including ad-attributes that reflect how much an advertisementcorrelates to a given user attribute.
 35. The method as recited in claim34, further comprising: determining an estimated selection probabilityfor the one or more advertisements of the plurality of advertisements;wherein: the estimated selection probability indicates a likelihood thatthe user will select a given advertisement; the estimated sectionprobability is a function of the consumer attributes associated with theuser and the advertisement profile for the given advertisement.
 36. Themethod as recited in claim 35, wherein selecting a first advertisementrelated to the one or more interests comprises selecting anadvertisement with the highest estimated selection probability.
 37. Themethod as recited in claim 27, wherein the marketing medium comprises asoftware program.
 38. The method as recited in claim 37, wherein themarketing medium comprises one or more websites.
 39. The method asrecited in claim 37, wherein the marketing medium is a software programon a mobile device.
 40. The method as recited in claim 27, wherein themarketing medium comprises a mobile device.
 41. The method as recited inclaim 27, wherein the one or more actions comprise accessing themarketing medium.
 42. A non-transitory computer-readable storage mediumincluding a set of instructions that, when executed, cause at least oneprocessor to perform steps comprising: receiving an advertisementrequest in response to one or more actions of a user on a marketingmedium; determining one or more interests of the user; selecting a firstadvertisement targeted to the one or more interests; identifying one ormore categories associated with the marketing medium; and if the one ormore categories is a permitted category, serving the first advertisementto the marketing medium.
 43. The computer-readable storage medium asrecited in claim 42, wherein if the one or more categories is not apermitted category, serving another advertisement to the marketingmedium.
 44. The computer-readable storage medium as recited in claim 42,further comprising instructions that, when executed, cause the at leastone processor to associate one or more categories received from anadvertiser with the first advertisement.
 45. The computer-readablestorage medium as recited in claim 43, wherein the block list comprisesa category of advertisements.
 46. The computer-readable storage mediumas recited in claim 45, wherein the one or more categories comprisemulti-levels of categories.
 47. The computer-readable storage medium asrecited in claim 46, wherein the one or more categories comprise asub-set of advertisements in a first category.
 48. The computer-readablestorage medium as recited in claim 42, further comprising instructionsthat, when executed, cause the at least one processor to maintain a userprofile, the user profile including user attributes reflecting interestsof the user.
 49. The computer-readable storage medium as recited inclaim 48, further comprising instructions that, when executed, cause theat least one processor to update the user attributes of the user profilebased on one or more of Internet sites visited by the user,advertisements selected by the user, or internet searching by the user.50. The computer-readable storage medium as recited in claim 48, furthercomprising instructions that, when executed, cause the at least oneprocessor to maintain an advertisement profile for one or moreadvertisements of the plurality of advertisements, each advertisementprofile including ad-attributes that reflect how much an advertisementcorrelates to a given user attribute.
 51. The computer-readable storagemedium as recited in claim 50, further comprising instructions that,when executed, cause the at least one processor to determine anestimated selection probability for the one or more advertisements ofthe plurality of advertisements; wherein: the estimated selectionprobability indicates a likelihood that the user will select a givenadvertisement; and the estimated section probability is a function ofthe consumer attributes associated with the user and the advertisementprofile for the given advertisement.
 52. The computer-readable storagemedium as recited in claim 51, wherein selecting a first advertisementrelated to the one or more interests comprises selecting anadvertisement with the highest estimated selection probability.
 53. Thecomputer-readable storage medium as recited in claim 42, wherein themarketing medium comprises a software program.
 54. The computer-readablestorage medium as recited in claim 53, wherein the marketing mediumcomprises one or more websites.
 55. The computer-readable storage mediumas recited in claim 53, wherein the marketing medium is a softwareprogram on a mobile device.
 56. The computer-readable storage medium asrecited in claim 42, wherein the marketing medium comprises a mobiledevice.
 57. The computer-readable storage medium as recited in claim 42,wherein the one or more actions comprise accessing the marketing medium.58. A method comprising: receiving an advertisement request in responseto one or more actions of a user on marketing medium on a mobile device;determining, using at least one processor, an estimated selectionprobability for one or more advertisements of a plurality ofadvertisements, wherein the estimated selection probability indicates alikelihood that the user will select a given advertisement; identifyingan advertisement with a high estimated selection probability; if theadvertisement is a permitted advertisement, serving the advertisement tothe mobile device.
 59. The method as recited in claim 58, furthercomprising, determining, using the at least one processor, if theadvertisement is a permitted advertisement.
 60. The method as recited inclaim 59, wherein determining if the advertisement is a permittedadvertisement comprises determining if the advertisement is on a blocklist associated with the marketing medium.
 61. The method as recited inclaim 59, wherein determining if the advertisement is a permittedadvertisement comprises determining if the marketing medium is on ablock list associated with the advertisement.
 62. The method as recitedin claim 58, wherein the marketing medium comprises a software program.63. The method as recited in claim 62, wherein the marketing mediumcomprises a website.
 64. The method as recited in claim 62, furthercomprising maintaining a user profile, the user profile including userattributes reflecting interests of the user.
 65. The method as recitedin claim 64, further comprising updating the user attributes of the userprofile based on one or more of Internet sites visited by the user,advertisements selected by the user, or internet searching by the user.66. The method as recited in claim 65, further comprising maintaining anadvertisement profile for the one or more advertisements of theplurality of advertisements, each advertisement profile includingad-attributes that reflect how much an advertisement correlates to agiven user attribute.
 67. The method as recited in claim 66, wherein theestimated selection probability for a particular advertisement is afunction of the consumer attributes of the user and the advertisementprofile for the particular advertisement.